Building a Large-scale Software Programming Taxonomy from Stackoverflow

نویسندگان

  • Jiangang Zhu
  • Beijun Shen
  • Xuyang Cai
  • Haofen Wang
چکیده

Taxonomy is becoming indispensable to a growing number of applications in software engineering such as software repository mining and defect prediction. However, the existing related taxonomies are always manually constructed. The sizes of these taxonomies are small and their depths are limited. In order to show the full potential of taxonomies in software engineering applications, in this paper, we present the first large-scale software programming taxonomy which is more comprehensive than any existing ones. It contains 38,205 concepts and 68,098 subsumption relations. Instead of learning from a open domain, we focus on taxonomy construction from Stackoverflow which is one of the largest QA websites about software programming. We propose a machine learning based method with novel features to create a taxonomy that captures the hierarchical semantic structure of tags in Stackoverflow. This method executes iteratively to find as many relations as possible. Experimental results show that our approach achieves much better accuracy than baselines. Compared with taxonomies related to software programming which are extracted from the general-purpose taxonomies such as WikiTaxonomy, Yago Taxonomy and Schema.org, our taxonomy has the widest coverage of concepts, contains the largest number of subsumption relations, and runs up to the deepest semantic hierarchy. Keywords—Taxonomy Construction, Stackoverflow, Software Engineering

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Software.zhishi.schema: A Software Programming Taxonomy Derived from Stackoverflow

In this paper, we are the first to construct a software programming taxonomy from Stackoverflow. More precisely, we propose a machine learning based method with novel features to capture the hierarchical semantic structure of tags in Stackoverflow. A graph pruning algorithm is applied to eliminate the conflicts by constructing a Directed Acyclic Graph (DAG). As a result, our dataset, named Soft...

متن کامل

A New Compromise Decision-making Model based on TOPSIS and VIKOR for Solving Multi-objective Large-scale Programming Problems with a Block Angular Structure under Uncertainty

This paper proposes a compromise model, based on a new method, to solve the multi-objective large-scale linear programming (MOLSLP) problems with block angular structure involving fuzzy parameters. The problem involves fuzzy parameters in the objective functions and constraints. In this compromise programming method, two concepts are considered simultaneously. First of them is that the optimal ...

متن کامل

Building a Domain Knowledge Base from Wikipedia: a Semi-supervised Approach

Knowledge bases are becoming indispensable to software engineering and knowledge engineering. However, the existing domain knowledge bases are always artificially constructed and small-scale. In this paper, we propose a semi-supervised approach to domain concepts detection and software engineering knowledge base construction from Wikipedia. First, the approach selects domain relevant tags from ...

متن کامل

A Compromise Decision-Making Model Based on TOPSIS and VIKOR for Multi-Objective Large- Scale Nonlinear Programming Problems with A Block Angular Structure under Fuzzy Environment

This paper proposes a compromise model, based on a new method, to solve the multiobjectivelarge scale linear programming (MOLSLP) problems with block angular structureinvolving fuzzy parameters. The problem involves fuzzy parameters in the objectivefunctions and constraints. In this compromise programming method, two concepts areconsidered simultaneously. First of them is that the optimal alter...

متن کامل

A Compromise Decision-making Model for Multi-objective Large-scale Programming Problems with a Block Angular Structure under Uncertainty

This paper proposes a compromise model, based on the technique for order preference through similarity ideal solution (TOPSIS) methodology, to solve the multi-objective large-scale linear programming (MOLSLP) problems with block angular structure involving fuzzy parameters. The problem involves fuzzy parameters in the objective functions and constraints. This compromise programming method is ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015